Dynamic load balancing for robust distributed computing in the presence of topological impairments

نویسندگان

  • Majeed M. Hayat
  • Jorge E. Pezoa
  • David Dietz
  • Sagar Dhakal
چکیده

The purpose of any distributed computing system (DCS) is to offer a flexible, reliable and powerful computing platform. With the advances in mobile computing, wireless communications and sensor networks, DCSs have emerged in new applications such as wireless sensor networks (WSNs), military battlefield awareness, surveillance and threat detection, to name a few. These new application areas introduce new challenges to DCSs when operated or deployed in harsh or threat-prone environments. For instance, in WSNs deployed in a military battlefield, the computing elements (CEs) of a DSC join and leave the DCS at any time in a stochastic fashion. More generally, factors such as limited or intermittent communication resources, CEs’ power constraints or long-term physical damage of the CEs, can result in random topological changes in the DCS, which, in turn, can severely degrade their performance and reliability. Many of these factors can be attributable to physical attacks on our information infrastructure, of which weapons of mass destruction (WMD) is an important example. This observation has triggered government agencies, such as the Defense Threat Reduction Agency, to launch research initiatives in network science to understand the extent of damage that can be inflicted upon networks in the event of attacks and also to develop strategies to increase the robustness of networks when threat is present. In this article, we review modern dynamic load balancing (DLB) techniques and their mathematical stochastic models that can be exploited by DCS developers to increase the DCS’s robustness to random topological changes, and at the same time, to efficiently use the available computing resources of the system in the presence of communication uncertainty and CE dysfunction. Two scenarios are considered: one where CEs can fail and recover at random instants and another where CEs can fail permanently. Under the first scenario we look for minimizing the average response time of a given application. In the second scenario the goal is to maximize the probability of successfully running an entire application. DLB policies are tested using a small-scale DCS environment and compared to Technical article Wiley Handbook of Science and Technology for Homeland Security Article ID: XXXX Page 27 theoretical predictions as well as results from Monte-Carlo simulations. The mathematical probabilistic model presented here for network performance is general and can be applied to a broad class of networks and applications. A new class of applications, based on sensor networks (SNs), has emerged in recent years. Examples of these applications are habitat monitoring, intrusion detection, defense, military battlefield awareness, structural health monitoring, and scientific exploration. These applications share a particular feature, namely the necessity of collaboration among the sensors. Consequently, the idea of performing distributed computing (DC) naturally appears in a SN environment. In fact, DC is being performed in wireless sensor networks (WSNs) in order to reduce the energy consumption of the set of sensors, make efficient use of network bandwidth, achieve desired quality of service, and reduce the response time of the entire system. These new applications have generated challenging scenarios, and the resource allocation solutions developed for traditional distributed computing systems (DCSs) are not appropriate under these new scenarios [1]. For instance, when DC is performed in a WSN, classical assumptions on node and communication-link reliability are no longer valid due to: (i) WSNs are typically deployed in harsh environments where sensors are prone to fail catastrophically; and (ii) in order to save energy, sensors are allowed to turn themselves off at any time. In addition, assuming that the cost of transmitting data among the sensors is negligible or deterministic is not valid either. Also, large-scale donation-based distributed infrastructures, such as peerto-peer networks and donation grids, exhibit a similar kind of behavior; the topology of the DCS changes over time as computing elements (CEs) enter to or depart from the DCS in a random fashion. Furthermore, any short term or long term damage that can be inflicted to the CEs in these infrastructures adds to this dynamic behavior. Clearly, the new scenarios for DC demand for solutions that can adapt to both fluctuations in the workload and changes in the number of CEs available in the DCS. We review here modern resource reallocation techniques that are effective in modern dynamic DCSs [2],[3],[4],[5]. In particular, we describe dynamic load balancing (DLB) policies that can be used to improve the performance of a DCS in the presence of random topological changes. These policies simultaneously attempt to improve the robustness of the DCS and to efficiently use the available CEs in the system. In this article we have considered two different performance metrics: the average response time of a software application and the probability of successfully executing an application. These metrics are analytically modeled using the novel concept of stochastic regeneration. Our model takes into account the heterogeneity in the computing capabilities of the CEs, the random failure and recovery times of the CEs and the random transfer times associated to communication network. Based upon this model, we devise DLB policies that optimize these two performance metrics. First, we discuss DLB policies suitable for scenarios where CEs can fail and recover at random instants of time, as in the case of DC in WSNs or donation-based DCSs. Then, we analyze policies for scenarios where CEs can fail permanently, which model long term physical Technical article Wiley Handbook of Science and Technology for Homeland Security Article ID: XXXX Page 37 damages like those inflicted by weapons of mass destruction (WMD). Our theory is supported by Monte-Carlo (MC) simulations and experimental results collected from a small-scale testbed DCS. The load-balancing problem in distributed computing Large, time-consuming applications can be processed by a DCS in a parallel fashion. To this end, applications have to be divided into smaller units, called modules or tasks, which can be processed independently at any CE of the system. Tasks have to be intelligently allocated onto the nodes in order to efficiently use the computing resources available in the DCS. This task allocation is referred to in the literature as load balancing (LB). LB strategies can be divided into static and dynamic, centralized or decentralized, and sender-initiated or receiver-initiated [1]. Static LB is performed before the actual execution of the application in the system. At compiling-time and based upon statistics of the DCS, the compiler accordingly divides the application into several tasks, which are allocated onto the CEs upon the execution of the application. In dynamic load balancing (DLB), both applications and LB algorithms are being executed in the DCS. The DLB algorithm continuously monitors the queue-length of the CEs, and upon the detection of an imbalance, it triggers the reallocation of tasks among the CEs. When the CEs are prone to fail, the LB algorithm must monitor also the working or failed state of the CEs. In this article we focus on decentralized DLB policies because, in general, DLB policies outperform static LB policies in DCSs where both the system workload and the number of functioning CEs dynamically change with time [1]. In order to optimize a given performance metric, any DLB policy has to answer the following fundamental questions at each CE: (i) When the jth CE has to trigger a LB action? (ii) How many tasks have to be processed at the jth CE?, and (iii) How many tasks have to be reallocated from the jth CE to the other CEs? The first question is answered by comparing the load at the jth CE with the average load in the system. To this end, information about the number of tasks queued at each CE must be collected by the jth CE. We denote by ) ( ˆ , t Q j k the number of tasks queued at the kth CE as perceived by the jth CE at time t. Based upon this information, the jth CE computes its excess load at time t, denoted as ) (t Lex j , using the formula:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Load Balancing Approaches for Web Servers: A Survey of Recent Trends

Numerous works has been done for load balancing of web servers in grid environment. Reason behinds popularity of grid environment is to allow accessing distributed resources which are located at remote locations. For effective utilization, load must be balanced among all resources. Importance of load balancing is discussed by distinguishing the system between without load balancing and with loa...

متن کامل

A Sender-Initiated Load Balancing Approach with Genetic Algorithm

Load balancing is a technique in which workload is distributed across multiple computers or other resources to get optimal resource utilization, minimum time delay, maximize throughput and avoid overload. Mainly in operation of parallel and distributed computing systems, load balancing problem arises. Load balancing scheme can be characterized as: static and dynamic load balancing. This paper r...

متن کامل

Triangular Dynamic Architecture for Distributed Computing in a LAN Environment

A computationally intensive large job, granulized to concurrent pieces and operating in a dynamic environment should reduce the total processing time. However, distributing jobs across a networked environment is a tedious and difficult task. Job distribution in a Local Area Network based on Triangular Dynamic Architecture (TDA) is a mechanism that establishes a dynamic environment for job distr...

متن کامل

An Effective Task Scheduling Framework for Cloud Computing using NSGA-II

Cloud computing is a model for convenient on-demand user’s access to changeable and configurable computing resources such as networks, servers, storage, applications, and services with minimal management of resources and service provider interaction. Task scheduling is regarded as a fundamental issue in cloud computing which aims at distributing the load on the different resources of a distribu...

متن کامل

Energy Aware Resource Management of Cloud Data Centers

Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Virtualization technology forms a key concept for new cloud computing architectures. The data centers are used to provide cloud services burdening a significant...

متن کامل

Dynamic Load Balancing in Geographically Distributed Heterogeneous Web Servers

With ever increasing Web traac, a distributed multi-server Web site can provide scalability and ex-ibility to cope with growing client demands. Load balancing algorithms to spread the requests across multiple Web servers are crucial to achieve the scalability. Various domain name server (DNS) based schedulers have been proposed in the literature, mainly for multiple homogeneous servers. The pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008